Quantifying the Informativeness of Similarity Measurements
نویسندگان
چکیده
In this paper, we describe an unsupervised measure for quantifying the ‘informativeness’ of correlation matrices formed from the pairwise similarities or relationships among data instances. The measure quantifies the heterogeneity of the correlations and is defined as the distance between a correlation matrix and the nearest correlation matrix with constant off-diagonal entries. This non-parametric notion generalizes existing test statistics for equality of correlation coefficients by allowing for alternative distance metrics, such as the Bures and other distances from quantum information theory. For several distance and dissimilarity metrics, we derive closed-form expressions of informativeness, which can be applied as objective functions for machine learning applications. Empirically, we demonstrate that informativeness is a useful criterion for selecting kernel parameters, choosing the dimension for kernel-based nonlinear dimensionality reduction, and identifying structured graphs. We also consider the problem of finding a maximally informative correlation matrix around a target matrix, and explore parameterizing the optimization in terms of the coordinates of the sample or through a lower-dimensional embedding. In the latter case, we find that maximizing the Bures-based informativeness measure, which is maximal for centered rank-1 correlation matrices, is equivalent to minimizing a specific matrix norm, and present an algorithm to solve the minimization problem using the norm’s proximal operator. The proposed correlation denoising algorithm consistently improves spectral clustering. Overall, we find informativeness to be a novel and useful criterion for identifying non-trivial correlation structure.
منابع مشابه
Institutional Ownership, Business Cycles and Earnings Informativeness of Income Smoothing: Evidence from Iran
Managers engage in income smoothing either to communicate private information about future earnings to investors (informativeness hypothesis) or to distort financial performance for opportunistic purposes (opportunism hypothesis). Business cycles and the monitoring role of institutional ownership may affect the earnings informativeness of income smoothing. The purpose of this research is to exa...
متن کاملThe Informativeness of Reported Earnings and Characteristics of the Audit Committee
An information usefulness approach to decision making points out that only the information is regarded as useful that will bring valuable messages to investors and lead to stock price adjustments. This study examines the effectiveness of audit committees in improving earnings quality and informativeness, particularly among family-owned firms. Earnings informativeness was measured through the re...
متن کاملMeasuring Term Informativeness in Context
Measuring term informativeness is a fundamental NLP task. Existing methods, mostly based on statistical information in corpora, do not actually measure informativeness of a term with regard to its semantic context. This paper proposes a new lightweight feature-free approach to encode term informativeness in context by leveraging web knowledge. Given a term and its context, we model contextaware...
متن کاملInvestigating the effect of stock price informativeness on labor investment efficiency
The Managerial learning hypothesis suggests that managers can learn the stock price informativeness of their stock company stock, which can help improve their decision-making efficiency. According to Managerial learning hypothesis, the stock price informativeness can affect the Labor investment efficiency, since stock prices contain valuable information that managers have about the company's fu...
متن کاملFusion of Similarity Data in Clustering
Fusing multiple information sources can yield significant benefits to successfully accomplish learning tasks. Many studies have focussed on fusing information in supervised learning contexts. We present an approach to utilize multiple information sources in the form of similarity data for unsupervised learning. Based on similarity information, the clustering task is phrased as a non-negative ma...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Machine Learning Research
دوره 18 شماره
صفحات -
تاریخ انتشار 2017